Pesquisa | Portal Regional da BVS

1.

AbFlex: designing antibody complementarity determining regions with flexible CDR definition.

Jeon, Woosung; Kim, Dongsup.

Bioinformatics ; 40(3)2024 Mar 04.

Artigo em Inglês | MEDLINE | ID: mdl-38449295

RESUMO

MOTIVATION: Antibodies are proteins that the immune system produces in response to foreign pathogens. Designing antibodies that specifically bind to antigens is a key step in developing antibody therapeutics. The complementarity determining regions (CDRs) of the antibody are mainly responsible for binding to the target antigen, and therefore must be designed to recognize the antigen. RESULTS: We develop an antibody design model, AbFlex, that exhibits state-of-the-art performance in terms of structure prediction accuracy and amino acid recovery rate. Furthermore, >38% of newly designed antibody models are estimated to have better binding energies for their antigens than wild types. The effectiveness of the model is attributed to two different strategies that are developed to overcome the difficulty associated with the scarcity of antibody-antigen complex structure data. One strategy is to use an equivariant graph neural network model that is more data-efficient. More importantly, a new data augmentation strategy based on the flexible definition of CDRs significantly increases the performance of the CDR prediction model. AVAILABILITY AND IMPLEMENTATION: The source code and implementation are available at https://github.com/wsjeon92/AbFlex.

Assuntos

Complexo Antígeno-Anticorpo , Regiões Determinantes de Complementaridade , Regiões Determinantes de Complementaridade/química , Regiões Determinantes de Complementaridade/metabolismo , Sequência de Aminoácidos , Modelos Moleculares , Complexo Antígeno-Anticorpo/química , Antígenos

2.

Applying network link prediction in drug discovery: an overview of the literature.

Son, Jeongtae; Kim, Dongsup.

Expert Opin Drug Discov ; 19(1): 43-56, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-37794688

RESUMO

INTRODUCTION: Network representation can give a holistic view of relationships for biomedical entities through network topology. Link prediction estimates the probability of link formation between the pair of unconnected nodes. In the drug discovery process, the link prediction method not only enables the detection of connectivity patterns but also predicts the effects of one biomedical entity to multiple entities simultaneously and vice versa, which is useful for many applications. AREAS COVERED: The authors provide a comprehensive overview of network link prediction in drug discovery. Link prediction methodologies such as similarity-based approaches, embedding-based approaches, probabilistic model-based approaches, and preprocessing methods are summarized with examples. In addition to describing their properties and limitations, the authors discuss the applications of link prediction in drug discovery based on the relationship between biomedical concepts. EXPERT OPINION: Link prediction is a powerful method to infer the existence of novel relationships in drug discovery. However, link prediction has been hampered by the sparsity of data and the lack of negative links in biomedical networks. With preprocessing to balance positive and negative samples and the collection of more data, the authors believe it is possible to develop more reliable link prediction methods that can become invaluable tools for successful drug discovery.

Assuntos

Descoberta de Drogas , Modelos Estatísticos , Humanos , Descoberta de Drogas/métodos

3.

G-RANK: an equivariant graph neural network for the scoring of protein-protein docking models.

Kim, Ha Young; Kim, Sungsik; Park, Woong-Yang; Kim, Dongsup.

Bioinform Adv ; 3(1): vbad011, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-36818727

RESUMO

Motivation: Protein complex structure prediction is important for many applications in bioengineering. A widely used method for predicting the structure of protein complexes is computational docking. Although many tools for scoring protein-protein docking models have been developed, it is still a challenge to accurately identify near-native models for unknown protein complexes. A recently proposed model called the geometric vector perceptron-graph neural network (GVP-GNN), a subtype of equivariant graph neural networks, has demonstrated success in various 3D molecular structure modeling tasks. Results: Herein, we present G-RANK, a GVP-GNN-based method for the scoring of protein-protein docking models. When evaluated on two different test datasets, G-RANK achieved a performance competitive with or better than the state-of-the-art scoring functions. We expect G-RANK to be a useful tool for various applications in biological engineering. Availability and implementation: Source code is available at https://github.com/ha01994/grank. Contact: kds@kaist.ac.kr. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

4.

Prediction of drug-target interactions through multi-task learning.

Moon, Chaeyoung; Kim, Dongsup.

Sci Rep ; 12(1): 18323, 2022 10 31.

Artigo em Inglês | MEDLINE | ID: mdl-36316405

RESUMO

Identifying the binding between the target proteins and molecules is essential in drug discovery. The multi-task learning method has been introduced to facilitate knowledge sharing among tasks when the amount of information for each task is small. However, multi-task learning sometimes worsens the overall performance or generates a trade-off between individual task's performance. In this study, we propose a general multi-task learning scheme that not only increases the average performance but also minimizes individual performance degradation, through group selection and knowledge distillation. The groups are selected on the basis of chemical similarity between ligand sets of targets, and the similar targets in the same groups are trained together. During training, we apply knowledge distillation with teacher annealing. The multi-task learning models are guided by the predictions of the single-task learning models. This method results in higher average performance than that from single-task learning and classic multi-task learning. Further analysis reveals that multi-task learning is particularly effective for low performance tasks, and knowledge distillation helps the model avoid the degradation in individual task performance in multi-task learning.

Assuntos

Aprendizagem , Aprendizado de Máquina , Descoberta de Drogas , Ligantes , Proteínas

5.

DeepLUCIA: predicting tissue-specific chromatin loops using Deep Learning-based Universal Chromatin Interaction Annotator.

Yang, Dongchan; Chung, Taesu; Kim, Dongsup.

Bioinformatics ; 38(14): 3501-3512, 2022 07 11.

Artigo em Inglês | MEDLINE | ID: mdl-35640981

RESUMO

MOTIVATION: The importance of chromatin loops in gene regulation is broadly accepted. There are mainly two approaches to predict chromatin loops: transcription factor (TF) binding-dependent approach and genomic variation-based approach. However, neither of these approaches provides an adequate understanding of gene regulation in human tissues. To address this issue, we developed a deep learning-based chromatin loop prediction model called Deep Learning-based Universal Chromatin Interaction Annotator (DeepLUCIA). RESULTS: Although DeepLUCIA does not use TF binding profile data which previous TF binding-dependent methods critically rely on, its prediction accuracies are comparable to those of the previous TF binding-dependent methods. More importantly, DeepLUCIA enables the tissue-specific chromatin loop predictions from tissue-specific epigenomes that cannot be handled by genomic variation-based approach. We demonstrated the utility of the DeepLUCIA by predicting several novel target genes of SNPs identified in genome-wide association studies targeting Brugada syndrome, COVID-19 severity and age-related macular degeneration. Availability and implementation DeepLUCIA is freely available at https://github.com/bcbl-kaist/DeepLUCIA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

COVID-19 , Aprendizado Profundo , Humanos , Cromatina , Estudo de Associação Genômica Ampla , Genômica/métodos

6.

Brain physiome: A concept bridging in vitro 3D brain models and in silico models for predicting drug toxicity in the brain.

Seo, Yoojin; Bang, Seokyoung; Son, Jeongtae; Kim, Dongsup; Jeong, Yong; Kim, Pilnam; Yang, Jihun; Eom, Joon-Ho; Choi, Nakwon; Kim, Hong Nam.

Bioact Mater ; 13: 135-148, 2022 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-35224297

RESUMO

In the last few decades, adverse reactions to pharmaceuticals have been evaluated using 2D in vitro models and animal models. However, with increasing computational power, and as the key drivers of cellular behavior have been identified, in silico models have emerged. These models are time-efficient and cost-effective, but the prediction of adverse reactions to unknown drugs using these models requires relevant experimental input. Accordingly, the physiome concept has emerged to bridge experimental datasets with in silico models. The brain physiome describes the systemic interactions of its components, which are organized into a multilevel hierarchy. Because of the limitations in obtaining experimental data corresponding to each physiome component from 2D in vitro models and animal models, 3D in vitro brain models, including brain organoids and brain-on-a-chip, have been developed. In this review, we present the concept of the brain physiome and its hierarchical organization, including cell- and tissue-level organizations. We also summarize recently developed 3D in vitro brain models and link them with the elements of the brain physiome as a guideline for dataset collection. The connection between in vitro 3D brain models and in silico modeling will lead to the establishment of cost-effective and time-efficient in silico models for the prediction of the safety of unknown drugs.

7.

Rising of LOXHD1 as a signature causative gene of down-sloping hearing loss in people in their teens and 20s.

Kim, Bong Jik; Jeon, Hyoung Won; Jeon, Woosung; Han, Jin Hee; Oh, Jayoung; Yi, Nayoung; Kim, Min Young; Kim, Minah; Kim, Justin Namju; Kim, Bo Hye; Hyon, Joon Young; Kim, Dongsup; Koo, Ja-Won; Oh, Doo-Yi; Choi, Byung Yoon.

J Med Genet ; 59(5): 470-480, 2022 05.

Artigo em Inglês | MEDLINE | ID: mdl-33753533

RESUMO

BACKGROUND: Down-sloping sensorineural hearing loss (SNHL) in people in their teens and 20s hampers efficient learning and communication and in-depth social interactions. Nonetheless, its aetiology remains largely unclear, with the exception of some potential causative genes, none of which stands out especially in people in their teens and 20s. Here, we examined the role and genotype-phenotype correlation of lipoxygenase homology domain 1 (LOXHD1) in down-sloping SNHL through a cohort study. METHODS: Based on the Seoul National University Bundang Hospital (SNUBH) genetic deafness cohort, in which the patients show varying degrees of deafness and different onset ages (n=1055), we have established the 'SNUBH Teenager-Young Adult Down-sloping SNHL' cohort (10-35 years old) (n=47), all of whom underwent exome sequencing. Three-dimensional molecular modelling, minigene splicing assay and short tandem repeat marker genotyping were performed, and medical records were reviewed. RESULTS: LOXHD1 accounted for 33.3% of all genetically diagnosed cases of down-sloping SNHL (n=18) and 12.8% of cases in the whole down-sloping SNHL cohort (n=47) of young adults. We identified a potential common founder allele, as well as an interesting genotype-phenotype correlation. We also showed that transcript 6 is necessary and probably sufficient for normal hearing. CONCLUSIONS: LOXHD1 exceeds other genes in its contribution to down-sloping SNHL in young adults, rising as a signature causative gene, and shows a potential but interesting genotype-phenotype correlation.

Assuntos

Surdez , Perda Auditiva Neurossensorial , Perda Auditiva , Adolescente , Adulto , Proteínas de Transporte/genética , Estudos de Coortes , Perda Auditiva Neurossensorial/diagnóstico , Perda Auditiva Neurossensorial/epidemiologia , Perda Auditiva Neurossensorial/genética , Humanos , Lipoxigenase , Adulto Jovem

8.

An enhanced variant effect predictor based on a deep generative model and the Born-Again Networks.

Kim, Ha Young; Jeon, Woosung; Kim, Dongsup.

Sci Rep ; 11(1): 19127, 2021 09 27.

Artigo em Inglês | MEDLINE | ID: mdl-34580383

RESUMO

The development of an accurate and reliable variant effect prediction tool is important for research in human genetic diseases. A large number of predictors have been developed towards this goal, yet many of these predictors suffer from the problem of data circularity. Here we present MTBAN (Mutation effect predictor using the Temporal convolutional network and the Born-Again Networks), a method for predicting the deleteriousness of variants. We apply a form of knowledge distillation technique known as the Born-Again Networks (BAN) to a previously developed deep autoregressive generative model, mutationTCN, to achieve an improved performance in variant effect prediction. As the model is fully unsupervised and trained only on the evolutionarily related sequences of a protein, it does not suffer from the problem of data circularity which is common across supervised predictors. When evaluated on a test dataset consisting of deleterious and benign human protein variants, MTBAN shows an outstanding predictive ability compared to other well-known variant effect predictors. We also offer a user-friendly web server to predict variant effects using MTBAN, freely accessible at http://mtban.kaist.ac.kr . To our knowledge, MTBAN is the first variant effect prediction tool based on a deep generative model that provides a user-friendly web server for the prediction of deleteriousness of variants.

9.

Development of a graph convolutional neural network model for efficient prediction of protein-ligand binding affinities.

Son, Jeongtae; Kim, Dongsup.

PLoS One ; 16(4): e0249404, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33831016

RESUMO

Prediction of protein-ligand interactions is a critical step during the initial phase of drug discovery. We propose a novel deep-learning-based prediction model based on a graph convolutional neural network, named GraphBAR, for protein-ligand binding affinity. Graph convolutional neural networks reduce the computational time and resources that are normally required by the traditional convolutional neural network models. In this technique, the structure of a protein-ligand complex is represented as a graph of multiple adjacency matrices whose entries are affected by distances, and a feature matrix that describes the molecular properties of the atoms. We evaluated the predictive power of GraphBAR for protein-ligand binding affinities by using PDBbind datasets and proved the efficiency of the graph convolution. Given the computational efficiency of graph convolutional neural networks, we also performed data augmentation to improve the model performance. We found that data augmentation with docking simulation data could improve the prediction accuracy although the improvement seems not to be significant. The high prediction performance and speed of GraphBAR suggest that such networks can serve as valuable tools in drug discovery.

Assuntos

Gráficos por Computador , Redes Neurais de Computação , Proteínas/metabolismo , Ligantes , Ligação Proteica

10.

Autonomous molecule generation using reinforcement learning and docking to develop potential novel inhibitors.

Jeon, Woosung; Kim, Dongsup.

Sci Rep ; 10(1): 22104, 2020 12 16.

Artigo em Inglês | MEDLINE | ID: mdl-33328504

RESUMO

We developed a computational method named Molecule Optimization by Reinforcement Learning and Docking (MORLD) that automatically generates and optimizes lead compounds by combining reinforcement learning and docking to develop predicted novel inhibitors. This model requires only a target protein structure and directly modifies ligand structures to obtain higher predicted binding affinity for the target protein without any other training data. Using MORLD, we were able to generate potential novel inhibitors against discoidin domain receptor 1 kinase (DDR1) in less than 2 days on a moderate computer. We also demonstrated MORLD's ability to generate predicted novel agonists for the D4 dopamine receptor (D4DR) from scratch without virtual screening on an ultra large compound library. The free web server is available at http://morld.kaist.ac.kr .

11.

CRDS: Consensus Reverse Docking System for target fishing.

Lee, Aeri; Kim, Dongsup.

Bioinformatics ; 36(3): 959-960, 2020 02 01.

Artigo em Inglês | MEDLINE | ID: mdl-31432077

RESUMO

MOTIVATION: Identification of putative drug targets is a critical step for explaining the mechanism of drug action against multiple targets, finding new therapeutic indications for existing drugs and unveiling the adverse drug reactions. One important approach is to use the molecular docking. However, its widespread utilization has been hindered by the lack of easy-to-use public servers. Therefore, it is vital to develop a streamlined computational tool for target prediction by molecular docking on a large scale. RESULTS: We present a fully automated web tool named Consensus Reverse Docking System (CRDS), which predicts potential interaction sites for a given drug. To improve hit rates, we developed a strategy of consensus scoring. CRDS carries out reverse docking against 5254 candidate protein structures using three different scoring functions (GoldScore, Vina and LeDock from GOLD version 5.7.1, AutoDock Vina version 1.1.2 and LeDock version 1.0, respectively), and those scores are combined into a single score named Consensus Docking Score (CDS). The web server provides the list of top 50 predicted interaction sites, docking conformations, 10 most significant pathways and the distribution of consensus scores. AVAILABILITY AND IMPLEMENTATION: The web server is available at http://pbil.kaist.ac.kr/CRDS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Computadores , Proteínas , Consenso , Ligantes , Conformação Molecular , Simulação de Acoplamento Molecular

12.

Prediction of mutation effects using a deep temporal convolutional network.

Kim, Ha Young; Kim, Dongsup.

Bioinformatics ; 36(7): 2047-2052, 2020 04 01.

Artigo em Inglês | MEDLINE | ID: mdl-31746978

RESUMO

MOTIVATION: Accurate prediction of the effects of genetic variation is a major goal in biological research. Towards this goal, numerous machine learning models have been developed to learn information from evolutionary sequence data. The most effective method so far is a deep generative model based on the variational autoencoder (VAE) that models the distributions using a latent variable. In this study, we propose a deep autoregressive generative model named mutationTCN, which employs dilated causal convolutions and attention mechanism for the modeling of inter-residue correlations in a biological sequence. RESULTS: We show that this model is competitive with the VAE model when tested against a set of 42 high-throughput mutation scan experiments, with the mean improvement in Spearman rank correlation â¼0.023. In particular, our model can more efficiently capture information from multiple sequence alignments with lower effective number of sequences, such as in viral sequence families, compared with the latent variable model. Also, we extend this architecture to a semi-supervised learning framework, which shows high prediction accuracy. We show that our model enables a direct optimization of the data likelihood and allows for a simple and stable training process. AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/ha01994/mutationTCN. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Aprendizado de Máquina , Redes Neurais de Computação , Mutação , Alinhamento de Sequência , Software

13.

In-Silico Molecular Binding Prediction for Human Drug Targets Using Deep Neural Multi-Task Learning.

Lee, Kyoungyeul; Kim, Dongsup.

Genes (Basel) ; 10(11)2019 11 07.

Artigo em Inglês | MEDLINE | ID: mdl-31703452

RESUMO

In in-silico prediction for molecular binding of human genomes, promising results have been demonstrated by deep neural multi-task learning due to its strength in training tasks with imbalanced data and its ability to avoid over-fitting. Although the interrelation between tasks is known to be important for successful multi-task learning, its adverse effect has been underestimated. In this study, we used molecular interaction data of human targets from ChEMBL to train and test various multi-task and single-task networks and examined the effectiveness of multi-task learning for different compositions of targets. Targets were clustered based on sequence similarity in their binding domains and various target sets from clusters were chosen. By comparing the performance of deep neural architectures for each target set, we found that similarity within a target set is highly important for reliable multi-task learning. For a diverse target set or overall human targets, the performance of multi-task learning was lower than single-task learning, but outperformed single-task for the target set containing similar targets. From this insight, we developed Multiple Partial Multi-Task learning, which is suitable for binding prediction for human drug targets.

Assuntos

Aprendizado Profundo , Descoberta de Drogas/métodos , Bibliotecas de Moléculas Pequenas/farmacologia , Bases de Dados de Compostos Químicos , Humanos , Simulação de Acoplamento Molecular/métodos , Ligação Proteica , Bibliotecas de Moléculas Pequenas/química

14.

Optimization of a microarray for fission yeast.

Kim, Dong-Uk; Lee, Minho; Han, Sangjo; Nam, Miyoung; Lee, Sol; Lee, Jaewoong; Woo, Jihye; Kim, Dongsup; Hoe, Kwang-Lae.

Genomics Inform ; 17(3): e28, 2019 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-31610624

RESUMO

Bar-code (tag) microarrays of yeast gene-deletion collections facilitate the systematic identification of genes required for growth in any condition of interest. Anti-sense strands of amplified bar-codes hybridize with ~10,000 (5,000 each for up- and down-tags) different kinds of sense-strand probes on an array. In this study, we optimized the hybridization processes of an array for fission yeast. Compared to the first version of the array (11 µm, 100K) consisting of three sectors with probe pairs (perfect match and mismatch), the second version (11 µm, 48K) could represent ~10,000 up-/down-tags in quadruplicate along with 1,508 negative controls in quadruplicate and a single set of 1,000 unique negative controls at random dispersed positions without mismatch pairs. For PCR, the optimal annealing temperature (maximizing yield and minimizing extra bands) was 58°C for both tags. Intriguingly, up-tags required 3ï´ higher amounts of blocking oligonucleotides than down-tags. A 1:1 mix ratio between up- and down-tags was satisfactory. A lower temperature (25°C) was optimal for cultivation instead of a normal temperature (30°C) because of extra temperature-sensitive mutants in a subset of the deletion library. Activation of frozen pooled cells for >1 day showed better resolution of intensity than no activation. A tag intensity analysis showed that tag(s) of 4,316 of the 4,526 strains tested were represented at least once; 3,706 strains were represented by both tags, 4,072 strains by up-tags only, and 3,950 strains by down-tags only. The results indicate that this microarray will be a powerful analytical platform for elucidating currently unknown gene functions.

15.

A compendium of promoter-centered long-range chromatin interactions in the human genome.

Jung, Inkyung; Schmitt, Anthony; Diao, Yarui; Lee, Andrew J; Liu, Tristin; Yang, Dongchan; Tan, Catherine; Eom, Junghyun; Chan, Marilynn; Chee, Sora; Chiang, Zachary; Kim, Changyoun; Masliah, Eliezer; Barr, Cathy L; Li, Bin; Kuan, Samantha; Kim, Dongsup; Ren, Bing.

Nat Genet ; 51(10): 1442-1449, 2019 10.

Artigo em Inglês | MEDLINE | ID: mdl-31501517

RESUMO

A large number of putative cis-regulatory sequences have been annotated in the human genome, but the genes they control remain poorly defined. To bridge this gap, we generate maps of long-range chromatin interactions centered on 18,943 well-annotated promoters for protein-coding genes in 27 human cell/tissue types. We use this information to infer the target genes of 70,329 candidate regulatory elements and suggest potential regulatory function for 27,325 noncoding sequence variants associated with 2,117 physiological traits and diseases. Integrative analysis of these promoter-centered interactome maps reveals widespread enhancer-like promoters involved in gene regulation and common molecular pathways underlying distinct groups of human traits and diseases.

Assuntos

Cromatina/metabolismo , Regulação da Expressão Gênica , Genoma Humano , Regiões Promotoras Genéticas , Sequências Reguladoras de Ácido Nucleico , Fatores de Transcrição/metabolismo , Cromatina/genética , Genômica , Humanos , Fatores de Transcrição/genética

16.

Global Analysis of Intercellular Homeodomain Protein Transfer.

Lee, Eun Jung; Kim, Namsuk; Park, Jun Woo; Kang, Kyung Hwa; Kim, Woo-Il; Sim, Nam Suk; Jeong, Chan-Seok; Blackshaw, Seth; Vidal, Marc; Huh, Sung-Oh; Kim, Dongsup; Lee, Jeong Ho; Kim, Jin Woo.

Cell Rep ; 28(3): 712-722.e3, 2019 07 16.

Artigo em Inglês | MEDLINE | ID: mdl-31315049

RESUMO

The homeodomain is found in hundreds of transcription factors that play roles in fate determination via cell-autonomous regulation of gene expression. However, some homeodomain-containing proteins (HPs) are thought to be secreted and penetrate neighboring cells to affect the recipient cell fate. To determine whether this is a general characteristic of HPs, we carried out a large-scale validation for intercellular transfer of HPs. Our screening reveals that intercellular transfer is a general feature of HPs, but it occurs in a cell-context-sensitive manner. We also found the secretion is not solely a function of the homeodomain, but it is supported by external motifs containing hydrophobic residues. Thus, mutations of hydrophobic residues of HPs abrogate secretion and consequently interfere with HP function in recipient cells. Collectively, our study proposes that HP transfer is an intercellular communication method that couples the functions of interacting cells.

Assuntos

Comunicação Celular/genética , Proteínas de Homeodomínio/metabolismo , Transporte Proteico/genética , Motivos de Aminoácidos/genética , Animais , Encéfalo/embriologia , Encéfalo/metabolismo , Linhagem Celular , Feminino , Ensaios de Triagem em Larga Escala , Proteínas de Homeodomínio/genética , Humanos , Interações Hidrofóbicas e Hidrofílicas , Camundongos , Camundongos Endogâmicos C57BL , Camundongos Knockout , Mutação , Gravidez , Retina/metabolismo

17.

FP2VEC: a new molecular featurizer for learning molecular properties.

Jeon, Woosung; Kim, Dongsup.

Bioinformatics ; 35(23): 4979-4985, 2019 12 01.

Artigo em Inglês | MEDLINE | ID: mdl-31070725

RESUMO

MOTIVATION: One of the most successful methods for predicting the properties of chemical compounds is the quantitative structure-activity relationship (QSAR) methods. The prediction accuracy of QSAR models has recently been greatly improved by employing deep learning technology. Especially, newly developed molecular featurizers based on graph convolution operations on molecular graphs significantly outperform the conventional extended connectivity fingerprints (ECFP) feature in both classification and regression tasks, indicating that it is critical to develop more effective new featurizers to fully realize the power of deep learning techniques. Motivated by the fact that there is a clear analogy between chemical compounds and natural languages, this work develops a new molecular featurizer, FP2VEC, which represents a chemical compound as a set of trainable embedding vectors. RESULTS: To implement and test our new featurizer, we build a QSAR model using a simple convolutional neural network (CNN) architecture that has been successfully used for natural language processing tasks such as sentence classification task. By testing our new method on several benchmark datasets, we demonstrate that the combination of FP2VEC and CNN model can achieve competitive results in many QSAR tasks, especially in classification tasks. We also demonstrate that the FP2VEC model is especially effective for multitask learning. AVAILABILITY AND IMPLEMENTATION: FP2VEC is available from https://github.com/wsjeon92/FP2VEC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Aprendizado Profundo , Processamento de Linguagem Natural , Relação Quantitativa Estrutura-Atividade , Software

18.

Prediction of binding property of RNA-binding proteins using multi-sized filters and multi-modal deep convolutional neural network.

Chung, Taesu; Kim, Dongsup.

PLoS One ; 14(4): e0216257, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31026297

RESUMO

RNA-binding proteins (RBPs) are important in gene expression regulations by post-transcriptional control of RNAs and immune system development and its function. Due to the help of sequencing technology, numerous RNA sequences are newly discovered without knowing their binding partner RBPs. Therefore, demands for accurate prediction method for RBP binding sites are increasing. There are many attempts for RBP binding site predictions using various machine-learning techniques combined with various RNA features. In this work, we present a new deep convolution neural network model trained on CLIP-seq datasets using multi-sized filters and multi-modal features to predict the binding property of RBPs. With this model, we integrated sequence and structure information to extract sequence motifs, structure motifs, and combined motifs at the same time. The RBP binding site prediction on RBP-24 dataset was compared with two multi-modal methods, GraphProt and Deepnet-rbp, using area under curve (AUC) of receiver-operating characteristics (ROC). Our method (average AUC = 0.920) outperformed 20 RBPs with GraphProt (average AUC = 0.888) and 15 RBP with Deepnet-rbp (average AUC = 0.902). The improvement was achieved by using multi-sized convolution filters, where average relative error reduction was 17%. By introducing new RNA structure representation, structure probability matrix, average relative error was reduced by 3% when compared to one-hot encoded secondary structure representation. Interestingly, structure probability matrix was more effective on ALKBH5, where relative error reduction was 30%. We developed new sequence motif enrichment method, which we stated as response enrichment method. We successfully enriched sequence motif for 12 RBPs, which had high resemblance with other literature evidences, RBPgroup and CISBP-RNA. Finally by analyzing these results altogether, we found intricate interplay between sequence motif and structure motif, which agreed with other researches.

Assuntos

Aprendizado Profundo , Redes Neurais de Computação , Área Sob a Curva , Sequência de Bases , Motivos de Nucleotídeos/genética , Ligação Proteica

19.

Hypomorphic Mutations in TONSL Cause SPONASTRIME Dysplasia.

Chang, Hae Ryung; Cho, Sung Yoon; Lee, Jae Hoon; Lee, Eunkyung; Seo, Jieun; Lee, Hye Ran; Cavalcanti, Denise P; Mäkitie, Outi; Valta, Helena; Girisha, Katta M; Lee, Chung; Neethukrishna, Kausthubham; Bhavani, Gandham S; Shukla, Anju; Nampoothiri, Sheela; Phadke, Shubha R; Park, Mi Jung; Ikegawa, Shiro; Wang, Zheng; Higgs, Martin R; Stewart, Grant S; Jung, Eunyoung; Lee, Myeong-Sok; Park, Jong Hoon; Lee, Eun A; Kim, Hongtae; Myung, Kyungjae; Jeon, Woosung; Lee, Kyoungyeul; Kim, Dongsup; Kim, Ok-Hwa; Choi, Murim; Lee, Han-Woong; Kim, Yonghwan; Cho, Tae-Joon.

Am J Hum Genet ; 104(3): 439-453, 2019 03 07.

Artigo em Inglês | MEDLINE | ID: mdl-30773278

RESUMO

SPONASTRIME dysplasia is a rare, recessive skeletal dysplasia characterized by short stature, facial dysmorphism, and aberrant radiographic findings of the spine and long bone metaphysis. No causative genetic alterations for SPONASTRIME dysplasia have yet been determined. Using whole-exome sequencing (WES), we identified bi-allelic TONSL mutations in 10 of 13 individuals with SPONASTRIME dysplasia. TONSL is a multi-domain scaffold protein that interacts with DNA replication and repair factors and which plays critical roles in resistance to replication stress and the maintenance of genome integrity. We show here that cellular defects in dermal fibroblasts from affected individuals are complemented by the expression of wild-type TONSL. In addition, in vitro cell-based assays and in silico analyses of TONSL structure support the pathogenicity of those TONSL variants. Intriguingly, a knock-in (KI) Tonsl mouse model leads to embryonic lethality, implying the physiological importance of TONSL. Overall, these findings indicate that genetic variants resulting in reduced function of TONSL cause SPONASTRIME dysplasia and highlight the importance of TONSL in embryonic development and postnatal growth.

Assuntos

Fibroblastos/patologia , Genes Letais , Mutação , NF-kappa B/genética , Osteocondrodisplasias/patologia , Adolescente , Adulto , Animais , Células Cultivadas , Criança , Pré-Escolar , Dano ao DNA , Derme/metabolismo , Derme/patologia , Feminino , Fibroblastos/metabolismo , Humanos , Lactente , Recém-Nascido , Camundongos , Camundongos Endogâmicos C57BL , Osteocondrodisplasias/genética , Sequenciamento do Exoma/métodos , Adulto Jovem

20.

Engineering Clostridial Aldehyde/Alcohol Dehydrogenase for Selective Butanol Production.

Cho, Changhee; Hong, Seungpyo; Moon, Hyeon Gi; Jang, Yu-Sin; Kim, Dongsup; Lee, Sang Yup.

mBio ; 10(1)2019 01 22.

Artigo em Inglês | MEDLINE | ID: mdl-30670620

RESUMO

Butanol production by Clostridium acetobutylicum is accompanied by coproduction of acetone and ethanol, which reduces the yield of butanol and increases the production cost. Here, we report development of several clostridial aldehyde/alcohol dehydrogenase (AAD) variants showing increased butanol selectivity by a series of design and analysis procedures, including random mutagenesis, substrate specificity feature analysis, and structure-based butanol selectivity design. The butanol/ethanol ratios (B/E ratios) were dramatically increased to 17.47 and 15.91 g butanol/g ethanol for AADF716L and AADN655H, respectively, which are 5.8-fold and 5.3-fold higher than the ratios obtained with the wild-type AAD. The much-increased B/E ratio obtained was due to the dramatic reduction in ethanol production (0.59 ± 0.01 g/liter) that resulted from engineering the substrate binding chamber and the active site of AAD. This protein design strategy can be applied generally for engineering enzymes to alter substrate selectivity.IMPORTANCE Renewable biofuel represents one of the answers to solving the energy crisis and climate change problems. Butanol produced naturally by clostridia has superior liquid fuel characteristics and thus has the potential to replace gasoline. Due to the lack of efficient genetic manipulation tools, however, clostridial strain improvement has been slower than improvement of other microorganisms. Furthermore, fermentation coproducing various by-products requires costly downstream processing for butanol purification. Here, we report the results of enzyme engineering of aldehyde/alcohol dehydrogenase (AAD) to increase butanol selectivity. A metabolically engineered Clostridium acetobutylicum strain expressing the engineered aldehyde/alcohol dehydrogenase gene was capable of producing butanol at a high level of selectivity.

Assuntos

Álcool Desidrogenase/metabolismo , Aldeído Desidrogenase/metabolismo , Butanóis/metabolismo , Clostridium acetobutylicum/enzimologia , Clostridium acetobutylicum/metabolismo , Engenharia Metabólica , Acetona/metabolismo , Álcool Desidrogenase/química , Álcool Desidrogenase/genética , Aldeído Desidrogenase/química , Aldeído Desidrogenase/genética , Domínio Catalítico , Etanol/metabolismo , Fermentação , Simulação de Dinâmica Molecular , Mutagênese Sítio-Dirigida

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA